Zero-Gradient-Sum Algorithms for Distributed Convex 
Optimization: The Continuous-Time Case* 



Jie Lu and Choon Yik Tang 
School of Electrical and Computer Engineering 
University of Oklahoma, Norman, OK 73019, USA 
^ ■ {jie. lu-1, cytang}@ou.edu 

g^; January 20, 2013 

GO ' 

OA , 

Abstract 

This paper presents a set of continuous-time distributed algorithms that solve unconstrained, 
separable, convex optimization problems over undirected networks with fixed topologies. The 
algorithms are developed using a Lyapunov function candidate that exploits convexity, and are 
called Zero- Gradient- Sum (ZGS) algorithms as they yield nonlinear networked dynamical sys- 
£f) ' terns that evolve invariantly on a zero-gradient-sum manifold and converge asymptotically to 

the unknown optimizer. We also describe a systematic way to construct ZGS algorithms, show 
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> 

■ that a subset of them actually converge exponentially, and obtain lower and upper bounds on 

their convergence rates in terms of the network topologies, problem characteristics, and algo- 
rithm parameters, including the algebraic connectivity, Laplacian spectral radius, and function 
curvatures. The findings of this paper may be regarded as a natural generalization of several 
well-known algorithms and results for distributed consensus, to distributed convex optimization. 
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1 Introduction 

This paper addresses the problem of solving an unconstrained, separable, convex optimization 
problem over an iV-node multi-hop network, where each node i observes a convex function /j, and 
all the N nodes wish to determine an optimizer x* that minimizes the sum of the /j's, i.e., 

N 

x* <E argminN (1) 

X . 

i=i 

The problem (|TJ) arises in many emerging and future applications of multi-agent systems and 
wired/wireless/social networks, where agents or nodes often need to collaborate in order to jointly 
accomplish sophisticated tasks in decentralized and optimal fashions [TJ. 
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To date, a family of discrete-time subgradient algorithms, aimed at solving problem (pQ) under 
general convexity assumptions, have been reported in the literature. These subgradient algorithms 
may be roughly classified into two groups. The first group of algorithms [IH1] are incremental in 
nature, relying on the passing of an estimate of x* around the network to operate. The second group 
of algorithms [5H7] are non-incremental, relying instead on a combination of subgradient updates 
and linear consensus iterations to operate, although gossip-based updates have also been consid- 
ered [8]. For each of these algorithms, a number of convergence properties have been established, 
including the resulting error bounds, asymptotic convergence, and convergence rates. 

In [9j, we introduced two gossip-style, distributed asynchronous algorithms, referred to as Pair- 
wise Equalizing (PE) and Pairwise Bisectioning (PB), which solve the scalar version of problem ([1]), 
in a manner that is fundamentally different from the aforementioned subgradient algorithms (e.g., 
PE and PB do not try to move along the gradient, nor do they require the notion of a stepsize). 
In |10j , we showed that the two basic ideas behind PE — namely, the conservation of a certain gradi- 
ent sum at zero and the use of a convexity-inspired Lyapunov function — can be extended, leading to 
Controlled Hopwise Equalizing (CHE), a distributed asynchronous algorithm that allows individual 
nodes to use potential drops in the value of the Lyapunov function to control, on their own, when 
to initiate an iteration, so that problem (pQ) may be solved efficiently. In both the papers [9lll0j. 
problem (pQ) was studied in a discrete-time, asynchronous setting, and only the scalar version of it 
was considered. 

In this paper, we address problem (pQ) from a continuous-time and multi-dimensional standpoint, 
building upon the two basic ideas behind PE. Specifically, assuming that each fa in (pQ) is twice 
continuously differentiable and strongly convex and using the same Lyapunov function candidate as 
the one for PE and CHE, we first derive a family of continuous-time distributed algorithms called 
Zero- Gradient- Sum (ZGS) algorithms, with which the states of the resulting nonlinear networked 
dynamical systems slide along an invariant, zero- gradient-sum manifold and converge asymptoti- 
cally to the unknown minimizer x* in (pQ). We then describe a systematic way to construct ZGS 
algorithms and prove that a subset of them are exponentially convergent. For this subset of algo- 
rithms, we also obtain lower and upper bounds on their convergence rates as functions of the network 
topologies, problem characteristics, and algorithm parameters, including the algebraic connectivity, 
Laplacian spectral radius, and curvatures of the fa's. As another contribution of this paper, we 
show that some of the existing continuous-time distributed consensus algorithms (e.g., |llH16j ) are 
special cases of ZGS algorithms and are, interestingly, just a slight modification away from solving 
any problem of the form (pQ). In addition, the well-known result from [12], which says that the 
convergence rate of a linear consensus algorithm is characterized by the algebraic connectivity of 
the underlying graph, is a special case of Theorem [2] here. 
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2 Preliminaries 



A twice continuously differentiable function / : 1" — > R is locally strongly convex if for any 
convex and compact set D C R n , there exists a constant 9 > such that the following equivalent 



conditions hold [TTl[l~8] : 

f( y )-f( x )-Vf(x) T (y-x) > ^\\y-x\\ 2 , Vx, y <E D, (2) 

(Vf(y)-Vf(x)) T (y-x)>e\\y-x\\ 2 , Vx, y £ D, (3) 
V 2 f(x)>0I n , ViGA (4) 

where || • || denotes the Euclidean norm, V/ : R n — > R n is the gradient of /, V 2 / : R ra — > R nxn is 



the Hessian of /, I n £ R nxn is the identity matrix, and > denotes matrix inequality (i.e., A > B 
means A — B is a positive semidefinite matrix). The function / is strongly convex if there exists 
a constant > such that the equivalent conditions ([2])-@ hold for D = M n , in which case 8 is 
called the convexity parameter of / [18]. Finally, for any twice continuously differentiable function 
/ : W 1 —7- M, any convex set D C M. n , and any constant > 0, the following conditions are 
equivalent [THlfTU] : 

f( y )-f( x )-Vf(x) T (y-x)<®\\y-x\\ 2 , Vx, y <E D, (5) 

(V/(y)-V/(x)) T (y-x) <9||y-x|| 2 , Vx,yeA (6) 
V 2 /(x)<9/ n , VxeD. (7) 

3 Problem Formulation 

Consider a multi-hop network consisting of iV > 2 nodes, connected by bidirectional links in 
a fixed topology. The network is modeled as a connected, undirected graph Q = (V,£), where 
V = {1,2,... ,iV} represents the set of N nodes and 6 C {{i,j} ■ i,j £ V,i ^ j} represents the 
set of links. Any two nodes i,j G V are one-hop neighbors and can communicate if and only if 
{i, j} € £. The set of one-hop neighbors of each node i € V is denoted as Mi = {j G V : {i, j} £ £}, 
and the communications are assumed to be delay- and error-free, with no quantization. 

Suppose each node i £ V observes a function fi : R n — >• R satisfying the following assumption: 

Assumption 1. For each i £ V, the function fi is twice continuously differentiable, strongly convex 
with convexity parameter 6i > 0, and has a locally Lipschitz Hessian V 2 fi. 

Suppose, upon observing the /j's, all the nodes wish to solve the following unconstrained, 
separable, convex optimization problem: 

min F(x), (8) 
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where the objective function F : W 1 — > R is defined as F{x) = X^iev fi( x )- The proposition below 
shows that F has a unique minimizer x* G R n , so that problem ([8]) is well-posed: 

Proposition 1. With Assumption \Q there exists a unique x* G W 1 such that F{x*) < F(x) 
Mx G R n and VF(x*) = 0. 

Proof. See Theorem 6 in |20j. □ 

Given the above network and problem, the aim of this paper is to devise a continuous-time 
distributed algorithm of the form 

x l {t) = cpi ( Xi (t) , x M (t) ; fi, f M ) , Vt > 0, Vi G V, (9) 
a*(0)=Xi(/i,fr/;)> ViGV, (10) 

where t > denotes time; G R n is a state representing node i's estimate of the unknown 

minimizer x* at time i; xjv;(i) = Ofy'COJjeA/i G R n l-^l is a vector obtained by stacking Xj(i) Vj G A/J; 
f M = (/jjje^ : R n Rl-^l is a function obtained by stacking £ Vj G A/i; </>i = K n xR n M -> R n is a 
locally Lipschitz function of Xi(i) and xj^^t) governing the dynamics of whose definition may 

depend on fi and fv" ; ; x« G R n is a constant determining the initial state x«(0), whose value may 
depend on fi and fj^.; | • | denotes the cardinality of a set; and Xi(t), fi, ipi, and Xi are maintained 
in node i's local memory. The goal of the algorithm ([9]) and (|10p is to steer all the estimates Xj(i)'s 
asymptotically (or, better yet, exponentially) to the unknown x* , i.e., 

lim Xi(t) = x* , ViGV, (11) 

t— >oo 

enabling all the nodes to cooperatively solve problem ([8]). Note that to realize Q and (fTUj) . for 
each i G V, every node jf G A/i must send node i its Xj(t) at each time i if ipi does depend on Xj(t), 
and its /j at time t = if (fi or Xi does depend on 

Remark 1 . As it turns out and will be shown in Section HJ each (fi and x» i n ® and (fTU|) do not 
have to depend on f^, so that the nodes do not have to exchange their /j's. We note that the 
algorithm PB in [9] also exhibits this feature, but the algorithms PE in [S] and CHE in [10] do not. 



4 Zero-Gradient-Sum Algorithms 

In this section, we develop a family of algorithms that achieve the stated goal. To facilitate 
the development, we let x* = (x*,x*, . . . ,x*) G M. nN and x = (xi,X2, • • • ,xjf) G M. nN denote the 
minimizer and state vectors, respectively, and write the latter as x(i) = (xi(i), X2{t), . . . , xjv(t)) 
when we wish to emphasize time or view it as a state trajectory. 

Consider a Lyapunov function candidate V : R nAr — > R, defined in terms of the observed /j's as 

y(x) = £ - - V/i(x f ) T (x* - x,). (12) 
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Notice that V in (|12p is continuously differentiable because of Assumption [TJ and that it satisfies 
V(x*) = 0. Moreover, V is positive definite with respect to x* and is radially unbounded, which 
can be seen by noting that Assumption [TJ and the first-order strong convexity condition (|2|) imply 

f(x) > ^2~\\x* -Xif, Vxer w , (13) 
iev 

and (|13p in turn implies V(x) > Vx ^ x* and V(x) — > oo as ||x|| — > oo. Therefore, V in (|12p is a 
legitimate Lyapunov function candidate, which may be used to derive algorithms that ensure (|lip . 

Taking the time derivative of V along the state trajectory x(t) of the system ([9]) and calling it 
V : M. nN — >• R, we obtain 

y(x(t)) = -x*) T V 2 / 4 (x i (t))^(^(t),x M (t);/ 4 ,f M ), Vt > 0. (14) 

iev 

Due to Assumption [TJ and to each (fi being locally Lipschitz, V in ()14p is continuous. In addition, 
it yields V"(x*) = 0. Hence, if the functions ipi \/i G V are such that is negative definite with 
respect to x*, i.e., 

^2(xi-x*) T V 2 fi(xi)ipi(x i ,x A f i ;fi,i A f z )<0, Vx/x*, (15) 
iev 

the system Q would have a unique equilibrium point at x*, which by the Barbashin-Krasovskii 
theorem [21j would be globally asymptotically stable. Consequently, regardless of how the constants 
Xi V2 G V in pop are chosen, the goal (fTTj) would be accomplished. 

As it follows from the above, the challenge lies in finding tpi Mi G V, which collectively satisfy 
(fT5"j) , Such (fiS, however, may be difficult to construct because x* in (fT5]) is unknown to any 
of the nodes, i.e., x* depends on every fi via ©, but maintained by each node i G V can 
onZy depend on /j and f/vj. As a result, one cannot let the ifi's depend on x* , such as letting 
ifi(xi,xj\f i ; fi,fj\f i ) = x*—XiVi&V, even though this particular choice guarantees (fT5"j) (since each 
V 2 /i(xi) is positive definite, by flU)). Given that the required yjj's are not readily apparent, instead 
of searching for them, below we present an alternative approach toward the goal (jlip . which uses 
the same V and V as in (fT2"j) and (|14p , but demands neither local nor global asymptotic stability. 

To state the approach, we first introduce two definitions: let A C M. nN represent the agreement 
set and A4 C M. nN represent the zero-gradient-sum manifold, defined respectively as 

A = {( Vl , y 2 , . . . ,y N ) G R nN : Vl = y 2 = ■ ■ ■ = y N }, (16) 

M = {(yi,y 2 , ..,w)el" w :^ Vfifa) = 0}, (17) 

iev 

so that x G A if and only if all the Xj's agree, and x G A4 if and only if the sum of all the gradients 
V/i's, evaluated respectively at the Zj's, is zero. Notice from ([To]) that x* G A, from ([T7]) and 
Proposition □ that x* G A4, and from all of them that xGylnX=^x = x*. Thus, An M = {x*}. 
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Also note from the continuity of each V/j that A4 is closed and from the Implicit Function Theorem 
and the nonsingularity of each V 2 /i(x) Vx € W 1 that A4 is indeed a manifold of dimension n(N— 1). 

Having introduced .4, and .M, we now describe the approach, which is based on the following 
recognition: to attain the goal ([TT]) . condition (fT5|) — which ensures that every trajectory x(i) goes 
to x* — is sufficient but not necessary. Rather, all that is needed is a single trajectory x(t), along 
which V(x(t)) < Vt > and lim^oo V(x(t)) = 0, since the latter implies (fTT]h Recognizing this, 
we next derive three conditions on the yVs and x»'s in ([9]) and (fTUj) that produce such a trajectory. 
Assume, for a moment, that the Xi' s dictating the initial state x(0) have been decided, so that we 
may focus on the (fiS that shape the trajectory x(t) leaving x(0). Observe that V in (|14p takes the 
form V(x(t)) = $i(x(t)) - x* T $ 2 (x(i)) Vt > 0, where $i : M niV ->■ R and $ 2 : M niV -> M n Thus, 
the unknown x* — which may undesirably affect the sign of V(x(t)) — can be eliminated by setting 
<3?2(x) = Vx G M. nN , i.e., by forcing the ip^s to satisfy 

^V 2 /i(^)^(^ ) x M ;/i,f M ) = 0, VxGR rf . (18) 
iev 

With this first condition (fT8|) , V" becomes free of x* , reducing to 

V(x(t)) = ^x i (t) T V 2 /-(x i (t))^(x i (t),x M (t);/ i ,f M ), Vt>0. (19) 
iev 

Next, notice that whenever x(t) is in the agreement set A, due to (fTB| and ([TH]) , V(x(i)) in (fT9j) 
must vanish. However, whenever x(f) ^ .4, there is no such restriction. Hence, any time x(i) ^ „4, 
V"(x(i)) can be made negative by forcing the ip^s to also satisfy 

X)^V 2 /i(jCi)Vi(^ J xjv t ;/i > fM)<0. Vxel nAf -i. (20) 
iev 

With this additional, second condition (|2Q|) . no matter what x* is, V(x(t)) < along x(t), with 
equality if and only if x(t) 6 A Finally, note that Q and (fT8|) imply 

^ V/i(a*(t)) = ]T V 2 /i(a?i(<))ii(<) = 0, Vi > 0, 
iev iev 

while (llip . the continuity of each V/j, and Proposition Q] imply 

lim V Vfi(xi(t)) = V V/i( lim x,(t)) = V V/,(x*) = VF(x*) = 0. 
iev iev iev 

The former says that by making the ip^s satisfy (fTHj) . the gradient sum X^ieV ^/i( x i(^)) along x(£) 
would remain constant over time, while the latter says that to achieve lim^oo V(x(t)) = or 
equivalently (fTTj) . this constant sum must be zero, i.e., X^ieV ^/i( x i(0) = Vi > 0. Therefore, in 
view of (flOl) . the Xi's must be such that 

^V/ i ( X i(/i,fM)) = 0, (21) 
iev 
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yielding the third and final condition. 

By imposing algebraic constraints on the (pi's and x«' s ; conditions (118p . (120p . and (121 j) charac- 
terize a family of algorithms. This family of algorithms share a number of properties, including 
one that has a nice geometric interpretation: observe from (|2ip. (|10p . and (|17[) that x(0) G A4 
and further from ([9]) and (118p that x(i) G A'l Vi > 0. Thus, every algorithm in the family pro- 
duces a nonlinear networked dynamical system, whose trajectory x(t) begins on, and slides along, 
the zero-gradient-sum manifold A4, making A4 a positively invariant set. Due to this geometric 
interpretation, these algorithms are referred to as follows: 

Definition 1. A continuous-time distributed algorithm of the form and (|10p is said to be a 
Zero- Gradient- Sum (ZGS) algorithm if tpt Mi G V are locally Lipschitz and satisfy (fT8|) and ([20]) . 
and Xi Vi G V satisfy (fUTjh 

The following theorem lists the properties shared by ZGS algorithms, showing that every one 
of them is capable of asymptotically driving x(t) to x*, solving problem ([5]): 

Theorem 1. Consider the network modeled in Section^ and the use of a ZGS algorithm described 
in Definition^ Suppose Assumption^ holds. Then: (i) there exists a unique solution x(i) Vt > 
to j!]) and CED; (ii) x(i) G M Mt > 0; (iii) V(x(t)) < Mt > 0, with equality if and only if 
x(i) = x*; (iv) lim^oo V(x(i)) = 0; and (v) lim^oo x(i) = x* ; i.e., (fTT|) holds. 

Proof. Since <pi Mi G V are locally Lipschitz, to prove (i) it suffices to show that every solution x(t) 
of © and (JTUJ) lies entirely in a compact subset of R nJV . To this end, let B(x*,r) C M nN denote 
the closed-ball of radius r G [0, oo) centered at x*, i.e., £>(x*,r) = {y G M nAr : ||y — x*|| < r}. 
Note from dHJ), (JTHJ) , and ((2D} that F(x(t)) < along x(i). This, together with flg|, im plies that 
K(x(0)) > F(x(t)) > 2%v^i|| xW _ x *||2 along x( i). Hence, x(i) G B(x*, Vt > 0, 

ensuring (i). Statement (ii) has been proven in the paragraph before Definition [TJ To verify (iii), 
notice again from ([H|). (flgj). and ([2D]) that F(x(t)) = if and only if x(t) G .A. Due to (ii) and to 
AO M = {x*} shown earlier, (iii) holds. To prove (iv) and (v), we will apply LaSalle's invariance 
principle from Theorem 4.4 in [21] to the dynamics flDJ. Let O = Mt~){y G R nJV : V(y) < V(x(0))}. 
Notice that Q is compact since M is closed and V in (|12|) is continuous and satisfies Also note 
from ((T7D, ©, and (JTHJ) that A4 is positively invariant, and from (0, {JTHJ), ©, An X = {x*}, 
and x* G 1] C M that V(x) < Vx G with equality if and only if x = x*. Thus, O is positively 
invariant as well. Moreover, the largest invariant set in {y G O : V(y) = 0} = {x*} is {x*}, since it 
must be nonempty. It follows from Theorem 4.4 in [21] that every solution starting in f2 approaches 
x* as t — > oo, including x(t). Therefore, (v) holds and, by the continuity of V, (iv) follows. □ 

Having established Theorem [H we now present a systematic way to construct ZGS algorithms. 
First, to find Xi' s that meet condition (|2ip . consider the following proposition, which shows that 
each fi has a unique minimizer x* G M. n : 
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Proposition 2. With Assumption [JJ, for each i G V, there exists a unique x* G W 1 such that 
fi(x*) < fi{x) Vx E R n and V/^xf) = 0. 



Proof. See Theorem 6 in |20| . □ 
Proposition [2] implies that fi(x*) = 0. Hence, (f2"Tj) can be met by simply letting 

Xi(/i,fjV i ) = x* ) ViGV, (22) 



which is permissible since every x* in (|22p depends just on /j. It follows that each node i G V must 
solve a "local" convex optimization problem mm x£ ^n fi(x) for x* before time t = 0, in order to 
execute (fTOl) and (J22j) . We note, however, that ([22]) is sufficient for ensuring (l2~Tj) but not necessary. 

Next, to generate locally Lipschitz (p^s that ensure conditions (fT5j) and (j2"Uj) . notice that each 
ifi is premultiplied by V 2 /j(xj), which is nonsingular Vxj G M n . Therefore, the impact of each 
V 2 /j(xj) can be absorbed by setting 

<Pi(xi,-x.tf i ;f i ,ftf i ) = (V 2 ^^)) -1 ^^, 3 ^;/^^), Vi G V, (23) 

where </>,■ : 

R n x R n|M| ]R n i S a locally Lipschitz function of Xj and maintained by node i. 
For each i G V, because V 2 /j is locally Lipschitz (due to Assumption [T]) and the determinant of 
V 2 /j(xi) for every Xj G W 1 is no less than a positive constant (due further to (|1J)), the mapping 
(V 2 /j(-)) _1 : K n -> M nxn in ([23]) is locally Lipschitz. Thus, as long as the fa's are locally Lipschitz, 
so would the resulting (piS, fulfilling the requirement. With (|23p . the dynamics ([9]) become 



ii(t) = (^fiix^r^iix^^ityj,,^), Vt > 0, Vi G V, (24) 
and conditions (|18p and (j20[) simplify to 

^2M*i>W,fiM = Q, VxGR^, (25) 

£ xj fa { Xi ,iw i ;fiM<0, VxeM nAf -i. (26) 

iev 

Finally, to come up with locally Lipschitz </>j's that assure conditions ([2ljj) and ([2"B"j) . suppose 
each c^j is decomposed as 

4>i(x i ,x A r i ;fi,f A f i )=^2(j)i j {xi,x j ;fi,f j ), \/i G V, (27) 



so that the dynamics (|24|) become 



i(t) = (VViC^iC*)))" 1 E /<>/;)> ^ > 0, Vi G V, (28) 
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where (ftij : M. n x M n — > M n is a locally Lipschitz function of Xi and Xj maintained by node i. Then, 
(|25() can be ensured by requiring that every (ftij and cftji pair be negative of each other, i.e., 

fajiv, z; ft, ft) = -<t>ji{z, y; fjji), Vi G V, Vj G Vy, * G R", (29) 

since J2ieV fa = EieV SjeM fai = J2{i,j}e£ faj + <t>j% = °- Witn (E3) and d29j), the left-hand side 
of (|26p turns into 

^^^(xjjX^s/jjfjvJ = - ^ ^ (xj - Xjffajix^Xj-Jufj), Vx G M niV . (30) 

Because the graph Q is connected, for any x G M. nN — A, there exist i G V and j G M% such that 
dSnj) is nonzero. Hence, (f2f)j) can be guaranteed by requiring the </>i/s to also satisfy 

(y - zfcPijiy, z; ft, ft) < 0, G V, Vj G M u Vy, z£l",^ Z . (31) 

Note that if (|29p holds, then (ftij satisfies the inequality in (|3ip if and only if (ftji does. Therefore, 
every pair of neighboring nodes i,j G V need only minimal coordination before time t = to 
realize the dynamics (|28p : only one of them, say, node i, needs to construct a (ftij that satisfies the 
inequality in (|3"Tjh and the other, i.e., node j, only needs to make sure that (ftji = —(ftij. 

Examples Q] and [2] below illustrate two concrete ways to construct </>y's that obey ([29]) and (j3Tj) : 

Example 1. Let (ftij(y, z; ft, ft) = (tftiji(yi, Zi), ^2(2/2, 22), • • • , ipijniVn, z n )) Vi G V Vj G A/i Vy = 
(2/1) 2/2) • • • j 2/n) G M n Vz = (zi, 2:2, . . . , z n ) G M n , where each : M 2 — >• K can be any locally 
Lipschitz function satisfying iftije(.y£, z e ) = -iftjie(ze,ye) and (y^ - ze)ipije(ye, ze) < Vy^/ ^ (e.g., 
fajtiyi, zt) = tanh(^ - yg) or ipij£(ye,z e ) = -iftju(ze,ye) = tt3)- Then, ((29]) and (J3TJ) hold. ■ 

Example 2. Let (ftij (y, z; ft, ft) = Vgujy(z) — Vy{jj\(y) Vi G V Vj G TVj Vy, z G M n , where each 
g{ij} ■ K n — > K. can be any twice continuously differentiable and locally strongly convex function 
associated with link {i,j} G £ (e.g., g{ijy(y) = \y T A^jyy where Aujy G M nxn is any symmetric 
positive definite matrix, or gu,j\(y) = ft(y) + ft(y) if the nodes do not mind exchanging their ft's). 
Then, (ggj and JSTJ hold. ■ 

Examples [3] and H] below show that some of the continuous-time distributed consensus algorithms 
in the literature are special cases of ZGS algorithms. In addition, they are just a slight modification 
away from solving general unconstrained, separable, convex optimization problems: 

Example 3. Consider the scalar (i.e., n = 1) linear consensus algorithm Xi(t) = J2jeAf z a ij{ x j{t) ~ 
Xi(t)) Vt > Vi G V with symmetric parameters ajj = a,ji > V{i, j} G £ and arbitrary initial states 
Xj(0) = 2/i Vi G V, studied in |12H14|, ITS]. By Definition [JJ and Theorem [TJ this algorithm is a ZGS 
algorithm that solves problem (|8|) for ft(x) = ^(x — yi) 2 Vi G V. Moreover, the algorithm is only a 
Hessian inverse and an initial condition away (i.e., Xi(t) = (V 2 /j(xj(t))) _1 YljeAfi a ij( x j(t) ~ x i(t)) 
with Xi(0) = x*) from solving any convex optimization problem of the form ([8]) for any n > 1. Note 
that the same can be said about the scalar nonlinear consensus protocol in ■ 
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Example 4. Consider the multivariable (i.e., n > 1) weighted-average consensus algorithm Xj(t) = 
W~ l ^2 jeAfi (xj(t) - Xi(t)) Mt > Mi G V with Wi = W? > and x»(0) = ?/j, proposed in [15J as a 
step toward a distributed Kalman filter. This algorithm is a ZGS algorithm that solves problem ([8]) 
for fi(x) = ^(x — yi) T Wi(x — Hi) Vi £ V. Indeed, it is only a replacement of Wj" 1 by (V 2 /i(xj(f))) -1 
and Xj(0) = y« by Xj(0) = x| away from solving for general /Vs. ■ 

5 Convergence Rate Analysis 

In this section, we derive lower and upper bounds on the exponential convergence rates of the 
ZGS algorithms described in (|28|) and Example O i.e., 

Xi{t) = (V 2 fiixiit)))- 1 V9{ij}(?j(t)) ~ V9{ij}(xi(t)), Vt > 0, Vi G V, (32) 

which form a subset of those in Definition [H but include the ones in Examples and 0] as a subset. 
To enable the derivation, suppose an initial state x(0) G M. is given (e.g., x(0) = {x\,x* 2 -, ■ ■ ■ ,x* N ) 
as in (HUD and fl22J). With this x(0), let Cj = {x G K n : /j(x*) - /,(x) - V/j(x) T (x* - x) < V(x(0))} 
Vi G V and let C = convUjgyCj, where conv denotes the convex hull. It follows from Assumption [TJ 
(fl2jh and (iii) in Theorem Q] that Cj Vi G V are compact, C is convex and compact, and 

Xi(t),x* GCiCC, Vt> 0, Vi G V. (33) 

For each i G V, due to Assumption Q3 ©, and C being compact, there exists a 0, > #j such that 

V 2 /j(x) < ej n , Vx G C. (34) 

Moreover, for each {i,j} G £, due to @, guji being locally strongly convex, and C being convex 
and compact, there exists a Juj} > such that 

i^9{i,j}(y) - ^9{i,j}{x)) T (y ~x) > 7{i,j}\\y - x|| 2 , Vx,yGC. (35) 

Furthermore, for each G £, due to ©, ([35]) . ^ 2 9{ij} being continuous, and C being 

convex and compact, there exists a Tuj} > 7{ij} such that 

^ 2 9{i,j}{x) < F{i,j}I n , Vx G C. (36) 

Observe that the constants Oj's, 7{ij}'s, and r^ji's — unlike the convexity parameters #j's — depend 
on the initial state x(0) via the sets C and Cj's. Thus, the convergence rate results obtained below 
are dependent on x(0) in general. One exception is the case where the /j's and g^ j\'s are quadratic 
functions, for which the #j's, Oj's, "fujys, an d r/jji's may be taken as the smallest and largest 
eigenvalues of the Hessians of the /j's and gujy's, respectively, independent of x(0). Finally, for 
convenience, let 9 = minj e y 6i, = maxj S y 0j, 7 = mmuj\ E £ Jujy, and T = max/^igg 

The following theorem establishes the exponential convergence of the ZGS algorithms (|32[) and 
provides a lower bound p on their convergence rates, that they can do no worse than: 



10 



Theorem 2. Consider the network modeled in Section^ and the use of a ZGS algorithm described 
in (|32p . Suppose Assumption^ holds. Then, 



V(x(t)) < V{x{0))e- pt , Vt > 0, (37) 

^26i\\xi(t) -x*\\ 2 < J2 @ i\\ x i(°) -x*\\ 2 e~ p \ Vt>0, (38) 
iev iev 

where p = sup{e G R : eP < Q} > 0, P = [P«] G R iVxAr is a positive semidefinite matrix given by 

p. _ )(h--k) @ i + 2hY.eev @ e> if* = 3, 

[ 27v^" + 2m otherwise, 

and Q = [Q%j] £ R iVxAr is a positive semidefinite matrix given by 



Q 



'j 



-7{ij}, ( 4 °) 
0, otherwise. 



Proof. Let ry(t) = i V . gV Xj(t) Vt > 0. Due to ([Ml) an d the convexity of C, rj(t) G C. Moreover, 
by Proposition [H £ l£V / f (s*) = P(x*) < F(»j(t)) = EievfMt))- Observe from (HZJ) and (ii) 
in Theorem!] that E ie v V /i(^i(*)) = 0. Thus, from F(x(t)) < Eiev fM*)) ~ /i («<(<)) " 

V/i(x i (t)) T (r ? (t) - x*(t)). It follows from ©, ©, ([MD, 433]), and (J3S|) that 

where (g> denotes the Kronecker product. Next, using f)32[) . ([9]), and (|19p. we can write 

^( X W) = "^EE (^'(*) " ^(t)) T (V 5{jii} (x,(t)) - V 5{j , i} (x 4 (t))), Vt > 0. (42) 

Therefore, from Q55J) , (1531) . and (|4ll . 

-^(x(t)) >^j; 7{ij} IM*) - ^Wll 2 = x(t) T (Q ® J n )x(t), Vt > 0. (43) 

ieV jeAfi 

To relate (|41[) and (|43p . notice from (|39p and (|40p that both P and Q are symmetric with zero 
row sums. Also, Vy = {y x , y 2 , . . . , y N ) G R^, y T Py = ^ ie v ~ F SjeV %') 2 ^ and 2/ T( 2y = 
5 Eiev SjeVi 7{i,j}(Uj-yi) 2 > °> where the equalities hold if and only if yi = y 2 = ■ ■ ■ = y N . Hence, 
both P and Q are positive semidefinite with N — 1 positive eigenvalues, one eigenvalue at 0, and 
(-^7=, -7=, . . . , 4=) being its corresponding eigenvector. It follows that there exists an orthogonal 
W G R NxN with the first column being (-±=, 4-, . . . , 4-), such that W T PW = diag(0,P) and 
W T QW = diag(0,Q), where P,Q G R^- 1 )*^" 1 ), p = p? > 0, and Q = Q T > 0. Note that 
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Ve G R, < Q eP < Q ^ eljv-i < P'^QP' 1 / 2 , where P l l 2 = {P l l 2 ) T > is the square 
root of P via the spectral decomposition, i.e., P = p 1 / 2 p 1 / 2 . Since p = sup{e G M : eP < Q} 
and P-^QP- 1 / 2 = (p- 1 / 2 Qp- 1 / 2 ) T > 0, p is the smallest eigenvalue of p-^QP' 1 / 2 which is 
positive and satisfies pP < Q. Therefore, p(P <S> I n ) < Q ® I n - This, along with ([IT]) and ([43]) . 
implies pV(x(t)) < -V(x(t)), i.e., §7$. Finally, due to (2}, (JED, pi}, (ED, ©, ©, and (p]) . 
E i6V £ IM*) " ^ll 2 < ^( X W) < ^(x(0))e-^ < E ieV % " x*\\ 2 e- pt , i.e., @SD holds. □ 

The lower bound p in Theorem [2] can be calculated according to its proof: p is the smallest 
eigenvalue of P~ 1 / 2 QP~ 1 / 2 . The corollary below gives another lower bound, which is not as tight 
as p but is explicit in the algebraic connectivity A2 > of the graph Q: 

Corollary 1. With the setup of Theorem® 

V(x(t)) < y(x(0))e~^ A2 *, Vt>0, (44) 

||x(t) -x*|| < \/f||x(0) -x*|| e -^ A2 ', Vt>0. (45) 
V f 

Proof. From (jUJ) and flMj), 

V(x(t)) < - l^x,(t)|| 2 = ^x(t) T (£ g - ® I„)x(t), Vt > 0, (46) 

iev jeV 

-V(x(t)) > E IM*) " ^(*)N 2 = 7x(i) T (A? ® In)x(t), Vt > 0, (47) 

where Cg G M ArxAr is the Laplacian of the complete graph with vertex set V, and Cg G M ArxAr is the 
Laplacian of £/. Obviously, Cg has AT — 1 eigenvalues at N, Cg has A^ — 1 positive eigenvalues among 
which A2 is the smallest, and both Cg and Cg have one eigenvalue at with (-^, n^, ■ ■ ■ , -7^) 
being its eigenvector. Let VF € R ArxiV contain A" orthonormal eigenvectors of Cg in its columns. 
Then, W T CgW and W T CgW are diagonal matrices similar to £g and Cg, and both contain the 
eigenvalue in the same diagonal position. Hence, X2W 7 CgW < NW T CgW , so that X2CQ < NCg. 
Applying this inequality to gBJ) and (|4T]) . we get ^A 2 F(x(i)) < -l/(x(t)), i.e., ([3I]). Finally, (05]) 
follows from (|44p the same way (|38p does from (|37p . □ 

Notice that in the special case where n = 1, /«(x) = ^(x — x*) 2 \/i G V, and guj\(x) = \x 2 
V{z,j} G 5, we may let the 0j's, Si's, and 7{i,j}'s all be 1. In this case, Theorem [2] and Corollary [JJ 
both yield ||x(i) — x*|| < ||x(0) — x*||e~ A2i Vi > 0, which coincides with the well-known convergence 
rate result for the linear consensus algorithm Xi(t) = Ylj^Af % x j(f) ~ Vi > V-i G V, reported 
in [12]. Hence, Theorem [2] and Corollary [JJ may be regarded as a generalization of such a result for 
distributed consensus, to distributed convex optimization. 

The next theorem looks at the performance of the ZGS algorithms (|32|) from the other end, 
providing an upper bound p on their exponential convergence rates that mirrors Theorem [2] 
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Theorem 3. Consider the network modeled in Section^ and the use of a ZGS algorithm described 
in (|32p . Suppose Assumption^ holds. Then, 

V(x(t)) > y(x(0))e-^, Vt > 0, (48) 
53©*lki(t) -x*\\ 2 > ^29i\\xi(0) -x*\\ 2 e-' p \ Vt>0, (49) 

where p = inf{e G R : eP > Q} > 0, P G R NxN is a positive definite matrix given by P = 
diag(^-, . . . , ^-), and Q = [Qij] G W NxN is a positive semidefinite matrix given by 



Qi 



-r {itj} , if{i,j}e£, (50) 

0, otherwise. 



Proof. From © and K(x(t)) > £; 6 y |||x;(t) ~ x *ll 2 = ( X W ~ x*) T (P /n)(x(t) - x*) 

Vt > 0. From g2|), ®, 0, ©, ©, and ®, -F(x(t)) < \ Z ie y E j6 M r {i,i}IM*) "^)H 2 = 
\ E ieV Eie^i r^lKxj-Ct) - x*) - ( Xi (t) - x*)\\ 2 = (x(t) - x*) r (Q ® / n )(x(t) - x*) > 0. Like 
Q in (|40p . Q i n (|50[) is symmetric positive semidefinite with exactly one eigenvalue at 0. Thus, 
so is P- 1 /2qp-1/2 ) where pl/2 = diag(Y / ^, • • • , \fif) is the square root of P. Since p = 
inf{e G R : eP > Q} and Ve G R, eP > Q O eI N > p- l / 2 Qp- 1 / 2 , p is the largest eigenvalue of 
P~ 1 / 2 QP~ 1 / 2 which is positive and such that pP > Q. Therefore, pV(x(t)) > —V(x(t)), proving 
(USD- Finally, from flE}, (pS), ©, ©, (03}, (USD, and we get (gSJ. □ 

In contrast to p, the upper bound /? in Theorem [3J is the largest eigenvalue of P~ l l 2 QP~ l l 2 . 
The next corollary is to Theorem [3J as Corollary Q] is to Theorem [2j giving another upper bound 
that is not as tight as p but is explicit in the spectral radius Ajv > of the graph Laplacian Cg: 

Corollary 2. With the setup of Theorem^ 

V(x(i)) > y(x(0))e~^ Aivt , Vt > 0, (51) 



||x(t) -x*|| > a/^||x(0) -x*||e^ AiV *, Vt>0. (52) 
Proo/. From the proof of Theorem El Vt > 0, we have V(x(t)) > £ feV ^\\xi{t)-x* \\ 2 = §||x(t)-x* || 2 

and-y(x(t))<i£, ev E^riK^X^-^^-^W-^^ll^r^W-^^^^^CxW-x*)^ 

rAjv||x(t) - || 2 . Consequently, 2 ^-\ N V{y:.{t)) > -V(x(t)), implying that ((51]) and (J52]) hold. □ 

Note that for the special case below Corollary [IJ we may let the Tujx's be 1, so that Theorem [3] 
and Corollary [2] both lead to ||x(t) -x*|| > ||x(0) -x*||e~ Ajv * Vt > 0, which is again known. 
Finally, note that the above analysis provides a framework for studying the interplay among network 
topologies (i.e., V and £), problem characteristics (i.e., the /j's, 6^s, and Oi's), and ZGS algorithm 
parameters (i.e., the gujys, 7{ij}'s, and IV o's), which may be worthy of further research. 
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6 Conclusion 



In this paper, using a convexity-based Lyapunov function candidate, we have developed a set of 
continuous-time ZGS algorithms, which solve a class of distributed convex optimization problems 
over networks. We have established the asymptotic and exponential convergence of these algorithms 
and derived lower and upper bounds on their convergence rates. We have also shown that the 
ZGS algorithms for distributed convex optimization are closely related to the basic algorithms for 
distributed consensus, suggesting that the former may be extended in a number of directions just 
like the latter were, in ways that possibly parallel the latter. 
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